--- title: Exploring Object Detection using Icevision w/ FastAI keywords: fastai sidebar: home_sidebar nb_path: "20_subcoco_ivf.ipynb" ---
{% raw %}
/usr/local/lib/python3.8/dist-packages/graphql/type/directives.py:55: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  assert isinstance(locations, collections.Iterable), 'Must provide locations for directive.'
/usr/local/lib/python3.8/dist-packages/graphql/type/typemap.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  from collections import OrderedDict, Sequence, defaultdict
Skipped 5362 out of 21837 images

{% endraw %} {% raw %}
/home/brian/.local/lib/python3.8/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
{% endraw %}

Download a Sample of COCO Data

The full COCO Dataset is huge (~50GB?). For my self education exploring object detection, with the intention of using pretrained model in transfer learning, it is not practical to deal with dataset this big as my first project. Luckily, the kind folks at FastAI have prepared some convenient subsets, the medium size 3GB https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz seems like a good candidate. The 800KB "http://files.fast.ai/data/examples/coco_tiny.tgz" on the other hand seems way too small, thus may not have enough data for adequate training.

{% raw %}
{% endraw %}

If playing with the tiny Coco subset, use these values

froot = "coco_tiny"
fname = f"{froot}.tgz"
url = f"http://files.fast.ai/data/examples/{fname}"
json_fname = datadir/froot/'train.json'
img_dir = datadir/froot/'train'

Check Annotations

Let's load and inspect the annotation file that comes with the coco tiny dataset...

{% raw %}
{% endraw %} {% raw %}
train_json['categories'], train_json['images'][0], [a for a in train_json['annotations'] if a['image_id']==train_json['images'][0]['id'] ]
([{'id': 62, 'name': 'chair'},
  {'id': 63, 'name': 'couch'},
  {'id': 72, 'name': 'tv'},
  {'id': 75, 'name': 'remote'},
  {'id': 84, 'name': 'book'},
  {'id': 86, 'name': 'vase'}],
 {'id': 318219, 'file_name': '000000318219.jpg'},
 [{'image_id': 318219,
   'bbox': [505.24, 0.0, 47.86, 309.25],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [470.68, 0.0, 45.93, 191.86],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [442.51, 0.0, 43.39, 119.87],
   'category_id': 72}])
{% endraw %}

Digest the Dataset for useful Stats

Do some basic analysis of the data to get numbers like total images, boxes, and average box count per image...

{% raw %}
{% endraw %} {% raw %}
print(
    f"Categories {stats.num_cats}, Images {stats.num_imgs}, Boxes {stats.num_bboxs}, avg (w,h) {(stats.avg_width, stats.avg_height)}"
    f"avg cats/img {stats.avg_ncats_per_img:.1f}, avg boxs/img {stats.avg_nboxs_per_img:.1f}, avg boxs/cat {stats.avg_nboxs_per_cat:.1f}.")

print(f"Image means by channel {stats.chn_means}, std.dev by channel {stats.chn_stds}")
stats.lbl2name, stats.lbl2cat, stats.cat2lbl, stats.lbl2name
Categories 6, Images 21837, Boxes 87106, avg (w,h) (575.6857626963424, 481.71420066859)avg cats/img 7.0, avg boxs/img 4.0, avg boxs/cat 14517.7.
Image means by channel [115.64436835 103.2992867   91.73613059], std.dev by channel [64.16724017 62.63021182 61.92975836]
({1: 'chair', 2: 'couch', 3: 'tv', 4: 'remote', 5: 'book', 6: 'vase'},
 {1: 62, 2: 63, 3: 72, 4: 75, 5: 84, 6: 86, 0: 0},
 {62: 1, 63: 2, 72: 3, 75: 4, 84: 5, 86: 6, 0: 0},
 {1: 'chair', 2: 'couch', 3: 'tv', 4: 'remote', 5: 'book', 6: 'vase'})
{% endraw %}

Load Data Using Custom Parser

To prevent bounding boxes being too close to margin or too small, especially after augmentation which performs transformations. I would set min_margin_ratio = 0.05, min_width_height_ratio = 0.05.

However, IceVision 2.0 now has autofix which should address these issues, it does take a long time to run though...

{% raw %}
Skipped 5362 out of 21837 images

{% endraw %}

shows images with corresponding labels and boxes

{% raw %}
class_map = ClassMap(list(stats.lbl2name.values()))
show_records(train_records[:4], ncols=2, class_map=class_map, show=True)
{% endraw %}

Create Transforms, Model, Training and Validation Dataloaders, Learners

  • Define transforms - using Albumentations transforms out of the box.
  • Use them to construct Datasets and Dataloaders.
  • Make a Learner
{% raw %}

gen_transforms_and_learner[source]

gen_transforms_and_learner(img_size=128, bs=4, acc_cycs=8)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
inf_tfms, learn, backbone_name = gen_transforms_and_learner(img_size=512, bs=4, acc_cycs=8)
{% endraw %}

I have experimented with other models available out of box in IceVision, but efficientdet works the best. You can replace backbone_name, backbone, model, with the following values to test.

backbone_name

  • "resnet_fpn.resnet18"

backbone

  • backbones.resnet_fpn.resnet18(pretrained=True)
  • backbones.resnet_fpn.resnet34(pretrained=True)
  • backbones.resnet_fpn.resnet50(pretrained=True) # Default
  • backbones.resnet_fpn.resnet101(pretrained=True)
  • backbones.resnet_fpn.resnet152(pretrained=True)
  • backbones.resnet_fpn.resnext50_32x4d(pretrained=True)
  • backbones.resnet_fpn.resnext101_32x8d(pretrained=True)
  • backbones.resnet_fpn.wide_resnet50_2(pretrained=True)
  • backbones.resnet_fpn.wide_resnet101_2(pretrained=True)

model

  • faster_rcnn.model(backbone=backbone, num_classes=len(stats.lbl2name))

Train using FastAI

{% raw %}
learn.lr_find()
SuggestedLRs(lr_min=0.33113112449646, lr_steep=3.981071586167673e-06)
{% endraw %} {% raw %}

run_training[source]

run_training(learn:Learner, min_lr=0.05, head_runs=1, full_runs=1)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
run_training(learn, head_runs=1, full_runs=0)
Training for 1+0 epochs at min LR 0.05
epoch train_loss valid_loss COCOMetric time
0 nan nan 0.027871 32:53
Better model found at epoch 0 with COCOMetric value: 0.027870522049878176.
{% endraw %}

Inference

{% raw %}
infer_ds = Dataset(valid_records[:4], inf_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=True)
samples, preds = efficientdet.predict_dl(learn.model, infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
    imgs=imgs[:4],
    preds=preds[:4],
    class_map=class_map,
    denormalize_fn=denormalize_imagenet,
    ncols=1,
    figsize=(36,27)
)

{% endraw %}

As you can see, training after only 2 epochs does not produce a usable model.

Saving Final Model Explicitly

Saving it explicitly after all the epochs.

{% raw %}

save_final[source]

save_final(save_model_fpath:str)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
final_saved_model_fpath = f"models/{backbone_name}-subcoco-final.pth"
save_final(final_saved_model_fpath)
{% endraw %}

Inference w/ Pretrained Model

Load a pretrained model.

{% raw %}
pretrained_model = efficientdet.model(model_name=backbone_name, num_classes=len(stats.lbl2name), img_size=512)
pretrained_model.load_state_dict(torch.load(final_saved_model_fpath))
<All keys matched successfully>
{% endraw %}

Run Inference with first 4 of the validation image...

{% raw %}
infer_ds = Dataset(valid_records[128:132], inf_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=False)
samples, preds = efficientdet.predict_dl(pretrained_model.cuda(), infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
    imgs=imgs[:4],
    preds=preds[:4],
    class_map=class_map,
    denormalize_fn=denormalize_imagenet,
    ncols=1,
    figsize=(36,27)
)

{% endraw %}